Skip to content

Add first version of write-ahead log#376

Merged
CGodiksen merged 118 commits intomainfrom
dev/write-ahead-log
Mar 27, 2026
Merged

Add first version of write-ahead log#376
CGodiksen merged 118 commits intomainfrom
dev/write-ahead-log

Conversation

@CGodiksen
Copy link
Copy Markdown
Collaborator

@CGodiksen CGodiksen commented Mar 13, 2026

This PR implements a write-ahead log (WAL) in modelardb_storage that ensures durability and crash recovery for ingested time series data. The WAL logs uncompressed data on a per-table basis before it enters the storage engine, and supports replaying unpersisted batches on startup to prevent data loss. The per-table WAL files are currently segmented based on batch count. Note that this PR includes the first version of the WAL, meaning that several future features have not been implemented yet.

The features that will be implemented in future PRs include controlling the threshold for segmentation and segmenting based on batch size, making it possible to disable the WAL, handling spilled buffers, explicitly handling data transfer and truncate, integration testing with fail-rs, and general optimizations.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a write-ahead log (WAL) for ModelarDB that ensures durability and crash recovery for ingested time series data. The WAL logs uncompressed data on a per-table basis using segmented IPC streaming files before data enters the storage engine. On startup, unpersisted batches are replayed from the WAL to prevent data loss. Batch IDs are tracked through the entire pipeline from ingestion to compression to persistence, and are recorded in Delta Lake commit metadata for checkpointing.

Changes:

  • Added WriteAheadLog and WriteAheadLogFile types implementing per-table segmented WAL with append, rotate, persist-tracking, and replay capabilities
  • Threaded batch IDs through the entire data pipeline (IngestedDataBufferUncompressedDataBufferCompressedSegmentBatchCompressedDataBuffer → Delta Lake commit metadata) and replaced the previous spilled-buffer recovery with WAL-based replay
  • Extended DeltaTableWriter to store batch IDs in Delta Lake commit metadata, and added InvalidState error variant for WAL-specific error conditions

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/modelardb_storage/src/write_ahead_log.rs New WAL implementation with segmented IPC files, batch tracking, and comprehensive tests
crates/modelardb_storage/src/lib.rs Added WAL module and WRITE_AHEAD_LOG_FOLDER constant
crates/modelardb_storage/src/error.rs Added InvalidState error variant
crates/modelardb_storage/src/data_folder/mod.rs Added location() accessor, batch_ids parameter to write method, commit metadata support
crates/modelardb_server/src/storage/mod.rs Integrated WAL into StorageEngine, added insert_data_points_with_batch_id
crates/modelardb_server/src/context.rs WAL initialization, replay on startup, table create/drop integration
crates/modelardb_server/src/main.rs Replaced spilled buffer init with WAL replay
crates/modelardb_server/src/storage/compressed_data_manager.rs Marks batches as persisted in WAL after saving to disk
crates/modelardb_server/src/storage/compressed_data_buffer.rs Added batch ID tracking to compressed data buffers
crates/modelardb_server/src/storage/uncompressed_data_manager.rs Replaced spilled buffer recovery with deletion, batch ID propagation
crates/modelardb_server/src/storage/uncompressed_data_buffer.rs Added batch ID tracking to in-memory and on-disk buffers
crates/modelardb_server/src/storage/data_transfer.rs Updated call sites with empty batch IDs for remote transfers
crates/modelardb_embedded/src/operations/data_folder.rs Updated call sites with empty batch IDs
crates/modelardb_server/src/configuration.rs Updated test setup for WAL
Cargo.toml / Cargo.lock Added serde_json and tracing dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CGodiksen CGodiksen requested a review from skejserjensen March 13, 2026 14:16
@CGodiksen CGodiksen requested a review from chrthomsen March 24, 2026 14:10
@CGodiksen CGodiksen requested a review from chrthomsen March 27, 2026 13:56
@CGodiksen CGodiksen merged commit f4fc3c2 into main Mar 27, 2026
6 of 9 checks passed
@CGodiksen CGodiksen deleted the dev/write-ahead-log branch March 27, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants